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Apparatus and method for- coding an information signal into 
a data stream, converting the data stream and decoding the 

data stream 

5 Background of the invention 

1- Technical field of the invention 

The present invention relates to information signal coding 
10 schemes in general and/ in particular , to coding schemes 
suitable for single media or multimedia signal coding, such 
as video coding or audio coding. 

2. Description of the prior art 

15 

The MPEG-2 video coding standard, which was developed about 
10 years ago primarily as an extension of prior MPEG-1 
video capability with support of interlaced video coding, 
was an enabling technology for digital television systems 
20 worldwide. It is widely used for the transmission of stan- 
dard definition (SD) and High Definition (HD) TV signals 
over satellite, cable, and terrestrial emission and the 
storage of high-quality SD video signals onto DVDs. 

25 However, an increasing number of services and growing popu- 
larity of high definition TV are creating greater needs for 
higher coding efficiency* Moreover, other transmission me- 
dia such as Cable Modem, xDSL or UMTS offer much lower data 
rates than broadcast channels, and enhanced coding effi- 

30 ciency can enable the transmission of more video channels 
or higher quality video representations within existing 
digital transmission capacities. 

Video coding for telecommunication applications has evolved 
35 through the development of the MPEG-2 coding standard, and 
has diversified from ISDN and Tl/El service to embrace 
PSTN, mobile wireless networks, and LAN/Internet network 
delivery. Despite this evolution, there is still a need to 
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maximize coding efficiency while dealing with the diversi- 
fication of network types and their characteristic format- 
ting and loss/error robustness requirements. 

5 Recently,, the MPEG-4 Visual standard has also begun to 
emerge in use in some application domains of the prior cod- 
ing standards. It has provided video shape coding capabil- 
ity, and has similarly worked toward broadening the range 
of environments for digital video use. 

10 

However, the video schemes available today have in common, 
that it is difficult to adapt an already coded video stream 
during its way from its creation to the arrival at a re- 
ceiver in order, for example, to adapt the performance 
15 level of the coded video stream to the performance of the 
receiver or to the performance of the transmission link 
connecting the coded video streams source and the receiver. 

For example, a MPEG-4 data stream may be provided at a 

20 video server in Dolby surround, thus providing a relatively 
large number of audio channels. However, the receiver may 
be a device capable of only reproducing mono-audio informa- 
tion. In this case, transferring the video-coded stream 
with full performance level, i*e. incorporating all audio 

25- channels, would mean waste of transfer-linked capacity. 
Thus, it would be advantageous if a gateway between the 
coded video stream source and the receiver could convert 
the coded video stream from its initial performance level 
to a lower performance level. However, in available video 

30 coding schemes, the gateway may not convert a video data 
stream from a higher performance level to a lower perform- 
ance level merely by discarding the portion of the coded 
video data stream pertaining the excessive channels without 
manipulating the reminder of the coded video . stream, i.e. 

35 the portion concerning both the higher performance level as 
well as the lower performance level. 
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Therefore, there is a need for a video coding scheme, which 
allows a higher "network friendliness" to enable simple and 
effective customization for a broad variety of systems. To 
be more specific, the video coding scheme should allow a 
5 greater customization of carrying the video content in a 
manner appropriate for each specific network. 
Moreover, the customization and adaptation of coded video 
streams should be possible with reasonable efforts. 

10 Summary of the invention 

It is the object of the present invention to provide an in- 
formation signal coding scheme which enables more customi- 
zation and adaptation of the coded data stream with reason- 
15 able efforts. 

In accordance with a first aspect of the present invention, 
this object is achieved by an apparatus for coding an in- 
formation signal, the apparatus comprising means for proc- 

20 essing the information signal in order to obtain data pack- 
ets, each data packet being of a data packet type of a 
predetermined set of data packet types, at least one of the 
data packet types being a removable data packet type; and 
means for arranging the data packets into a data stream so 

25 that the data stream comprises consecutive access units of 
consecutive data packets, so that the data packets within 
each access unit are arranged in accordance with a prede- 
termined order among the data packet types, wherein the 
means for processing and the means for arranging are 

30 adapted so that even when a data packet of the removable 
data packet type is removed from the data stream, borders 
between successive access units are detectable from the 
data stream by use of the predetermined order. 

35 In accordance with a second aspect of the present inven- 
tion, this object is achieved by an apparatus for convert- 
ing a data stream representing a coded version of an infor- 
mation signal from a first performance level to a second 
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performance level, the data stream comprising consecutive 
access units of consecutive data packets, each data packet 
being of a data packet type of a predetermined set of data 
packet types, at least one of the data packet types being a 
5 removable data packet type, and the data packets within 
each access unit being arranged in accordance to a prede- 
termined order among the data packet types such that even 
when a data packet of the removable data packet type is re- 
moved from the data stream, borders between successive ac- 
10 cess units are detectable from the data stream by use of 
the predetermined order, the apparatus comprising means for 
removing at least one data block of the removable data 
packet type from the bit stream without manipulating the 
reminder of the data stream. 

15 

In accordance with a third aspect of the present invention, 
this object is achieved by an apparatus for decoding a data 
stream representing a coded version of an information sig- 
nal, the data stream comprising consecutive access units of 

20 consecutive data packets, each data packet being of a data 
packet type of a predetermined set of data packet types, at 
least one of the data packet types being a removable data 
packet type, and the data packet within each access unit 
being arranged in accordance with a ^predetermined order 

25 among the data packet types, such that even when a data 
packet of the removable data packet type is removed from 
the data stream, borders between successive access units 
are detectable from the data stream by use of the predeter- 
mined order, the apparatus comprising means for detecting a 

30 border between successive acc&ss units by use of the prede- 
termined order; and means for decoding the successive ac- 
cess units. 

In accordance with a forth aspect of the present invention, 
35 this object is achieved by a method for coding an informa- 
tion signal, the method comprising processing the informa- 
tion signal in order to obtain data packets, each data 
packet being of a data packet type of a predetermined set 
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of data packet types , at least one of the data packet types 
being a removable data packet type; and arranging the data 
packets into a data stream so that the data stream com- 
prises consecutive access units of consecutive data pack- 
5 ets, so that the data packets within each access unit are 
arranged in accordance with a predetermined order among the 
data packet types , wherein the steps of processing and ar- 
ranging are adapted so that even when a data packet of the 
removable data packet type is removed from the data stream,, 
10 borders between successive access units are detectable from 
the data stream by use of the predetermined order. 

In accordance with a fifth aspect of the present invention/ 
this object is achieved by a method for converting a data 

15 stream representing a coded version of an information sig- 
nal from a first performance level to a second performance 
level, the data stream comprising consecutive access units 
of consecutive data packets, each data packet being of a 
data packet type of a predetermined set of data packet 

20 types, at least one of the data packet types being a remov- 
able data packet type, and the data packets within each ac- 
cess unit being arranged in accordance to a predetermined 
order among the data packet types such that even when a 
data packet of the removable data packet type is removed 

25 from the data stream, borders between successive access 
units are detectable from the data stream by use of the 
predetermined order, the method comprising removing at 
least one data block of the removable data packet type from 
the bit stream without manipulating the reminder of the 

30 data stream. 

In accordance with a sixth aspect of the present invention/ 
this object is achieved by a method for decoding a data 
stream representing a coded version of an information sig- 
35 nal r the data stream comprising consecutive access units of 
consecutive data packets, each data packet being of a data 
packet type of a predetermined set of data packet types / at 
least one of the data packet types being a removable data 



packet type, and the data packet within each access unit 
being arranged in accordance with a predetermined order 
among the data packet types, such that even when a data 
packet of the removable data packet type is removed from 
the data stream, borders between successive access units 
are detectable from the data stream by use of the predeter- 
mined order, the method comprising detecting a border be- 
tween successive ^ac&ss units by use of the predetermined 
order; and decoding the successive access units . 

In accordance with a sixth aspect of the present invention, 
this object is achieved by a data stream representing a 
coded version of a video or audio signal, the data stream 
comprising consecutive access units of consecutive data 
packets, each data packet being of a data packet type of a 
predetermined set of data packet types, at least one of the 
data packet types being a removable data packet type, and 
the data packets within each access unit being arranged in 
accordance with a predetermined order among the data packet 
types such that even when a data packet of the removable 
data packet type is removed from the data stream, borders 
between successive access units or detectable from the data 
stream by use of the predetermined order. 

The present invention is based on the finding that a cus- 
tomization and adaptation of coded data streams may be 
achieved by processing the information signal such that the 
various syntax structures obtained by pre-coding the infor- 
mation signal are placed into logical data packets, each of 
which being associated with a specific data packet type of 
a predetermined set of data packet types, and by defining a 
predetermined order of data packet types within one access 
unit of data packets. The consecutive access units in the 
data stream may, for example, correspond to different time 
portions of the information signal- By defining the prede- 
termined order among the data packet types it is possible, 
at decoder's side, to detect the borders between successive 
access units even when removable data packets are removed 
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from the data stream on the way from the data stream source 
to the decoder without incorporation of any hints into the 
reminder of the data stream* Due to this, decoders surely 
detect the beginnings and endings of access units and 
5 therefore are not liable to a buffer overflow despite a re- 
moval of data packets from the data stream before arrival 
at the decoder. 

The removable data packets may be data packets which are 
10 negligible or not necessary for decoding the values of the 
samples in the information signal- In this case, the remov- 
able data packets may contain redundant information con- 
cerning the video content . Alternatively, such removable 
data packets may contain supplemental enhancement informa- 
15 tion, such as timing information and other supplemental 
data that may enhance usability of the decoded information 
signal obtained from the data stream but are not necessary 
for decoding the values of the samples of the informations 
signal. 

20 

However, the removable data packets may also contain pa- 
rameters sets, such as important header data, that can ap- 
ply to a large number of other data packets. In this case, 
such removable data packets contain information necessary 

25 for retrieval of the video content from the data stream- 
Therefore, in case of removal of such data packets, same 
are transferred to the receiver in another way, for exam- 
ple, by use of a different transmission link or by insert- 
ing thus removed data packet somewhere else into the data 

30 stream in accordance with the predetermined order among the 
data packet types in order not to accidentally create a 
condition in the data stream defining the beginning of a 
new access unit although being in the middle of an access 
unit* 

35 

Thus, it is an advantage of the present invention that an 
information signal may be coded into a data stream composed 
of consecutive data packets, and that removable data pack- 
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ets may be removed from the data stream without having to 
manipulate the reminder of the data stream and with, de- 
spite this / the order among the data packet types within 
access units being maintained so that boarders between suc- 
cessive access units are still derivable by use of the or- 
der, preferably merely by the knowledge of the order. 

Moreover, another advantage of the present invention is the 
higher flexibility in arranging the data packets in the 
data stream as long as the arrangement complies with the 
predetermined order among the data packet types. This al- 
lows duplicating data packets for redundancy enhancement 
and purposes as well as adapting the performance level of 
the data stream to the receiving or transmission environ- 
ment . 

SHORT DESCRIPTION OF THE DRAWINGS 

Preferred embodiment of the present invention are described 
in more detail below with respect to the Figures. 

Fig. 1 shows a schematic diagram illustrating a crea- 
tion, conversion and decoding of a data stream in 
accordance with an embodiment of the present 
invention. 

Fig. 2 shows a block diagram of a system in which the 
procedures of Fig. 1 may be realized in accordance 
with an embodiment of the present invention. 

Fig. 3 shows a block diagram of an encoder environment 
in accordance with an embodiment of the present inven- 
tion. 

Fig. 4 shows a schematic diagram illustrating the struc- 
ture of a data stream in accordance with a specific 
embodiment of the present invention. 
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Fig. 5 shows a syntax diagram for illustrating the 
structure of an access unit in accordance with the 
specific embodiment of Fig. 4. 

5 Fig, 6 shows a flow diagram for illustrating a possible 
mode of operation in the gateway of Fig. 2 in accor- 
dance with an embodiment of the present invention. 

Fig. 7 shows a schematic diagram illustrating the pa- 
10 rameter set transmission via an extra transmission 

link between encoder and decoder in accordance with an 
embodiment of the present invention. 

Fig. 8 shows a flow diagram illustrating the operation 
15 of the decoder of Fig. 2 in accordance with the spe- 

cific embodiment of the present invention. 

Detailed description of preferred embodiments of the pre- 
sent invention 

20 

Before describing preferred embodiments of the present in- 
vention with respect to the figures, it is noted that like 
elements in the figures are designated by like reference 
numbers, and that a repeated explanation of these elements 
25 is left-out. 

Fig. 1 shows the creation, conversion and decoding of a 
data stream in accordance with an embodiment of the present 
invention, the data stream representing a coded version of 
30 an information signal, such as an audio, video or multi- 
media signal. 

In Fig. 1, the information signal is indicated by reference 
number 10. Although the information signal 10 could be any 
35 time-domain or time-dependent information signal, the in- 
formation 10 is illustrated as a multimedia signal com- 
prised of a video signal or video content 10a and an audio 
signal or audio content 10b. The video content 10a is il- 
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lustrated as being composed of a sequence of pictures 12 , 
while the audio signal 10b is illustrated as comprising a 
sequence of audio samples 14, the sequence extending along 
the time axis t. 

5 

Although the information signal 10 could be handled, such 
as stored and transferred, in an un-coded digital manner, 
the information signal 10 is encoded in order to compress 
the information signal, i.e. to reduce the amount of data 

10 necessary in order to represent the information signal. 
This encoding process is indicated in Fig, 1 by arrow 16, 
while an encoder performing the encoding process 16 is in- 
dicated at 18 in Fig. 2 which is also referred to in the 
following and which shows an example for a possible envi- 

15 ronment in which the processes of Fig. 1 could be employed. 

By the encoding process 16 a bit stream 20 is obtained. The 
bit stream 20 is composed of a sequence of consecutive data 
packets 22, with the data stream 20 being illustrated as an 

20 arrow. The direction of the arrow indicates which of the 
data packets 22 precedes which data packet 22 of the data 
stream 20. The data packets are indicated by individual 
rectangles inside the arrow 20 and are labeled by A-F. Each 
data packet is uniquely associated with^ one of a predeter- 

25 rained set of data packet types, each data packet type being 
illustrated by A-F- The data packets 22 are, for example, 
associated with a respective data packet type by a type 
number in a header of the data packets 22. Each data packet 
type would by uniquely associated with a different type 

30 number. 

Several consecutive data packets 22 are grouped into an ac- 
cess unit, as illustrated by braces 24. In this way, the 
data stream 20 is composed of immediately consecutive ac- 
35 cess units 24 which are themselves composed of immediately 
consecutive data packets 22. 
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Although access units 24 could have any meaning, in the 
following it will be assumed that each access unit 24 be- 
longs to a specific time portion of the information signal 
10- In the case of a multimedia signal, as illustrated at 
5 10, each access unit 24 could, for example, represent a 
coded version of a specific of the pictures 12 and the cor- 
responding portion of the audio signal 14 of the informa- 
tion signal 10. 

10 As will be described in more detail below with respect to 
Fig. 3, the encoding process 16 could be composed of sev- 
eral steps. For example, as a first step the encoding proc- 
ess 16 could involve a pre-coding step in which samples of 
the information signal are pre-coded in order to obtain 

15 syntax elements of various syntax element types, each syn- 
tax element either applying to a portion of one picture 12 
or a portion of the audio signal 14 r to a whole picture 12 
or to a sequence of pictures 12. As a second step, the en- 
coding process 16 could then involve a step of grouping 

20 syntax elements being of the like syntax element type and 
applying to the same pictures 12 to obtain the individual 
data packets 22. In a further, last step, these data pack- 
ets 22 would then be arranged in a sequence in order to ob- 
tain the data stream 20 the characteristics if which will 

25 be described in more detail below. 

In the following, the encoding process 16 is assumed to be 
optimized in order to achieve a high-performance level 
coded version of the information signal 10. In other words, 

30 the encoding process 16 is assumed to be adjustable in the 
sense that the encoding process creates, beside others, 
syntax elements and corresponding data packets 22 which are 
not essential or absolutely necessary for retrieval of the 
information signal from the resulting data stream 20. In 

35 particular, it is assumed that the encoder 18 creates a 
data stream 20 being composed of the data packets of all 
possible or envisaged data packet types A-F. Of course, due 
to the high-performance level of the data stream 20, same 
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involves a greater amount of data than a data stream of a 
lower-performance level. 

As shown in Fig. 2, it is assumed that the data stream 20 
5 is firstly stored in a store 26 such as a video server or 
the like, with which the encoder 18 is connected. Now, in 
order to enable the transmission of the data stream 20 to a 
receiver 28 via a transmission link 30 in an efficient way, 
a gateway 32 is connected between the store 26 and the re- 

10 ceiver 28, and preferably between the store 26 and the 
transmission link 30- This gateway 32 performs an adapta- 
tion or conversion of the data stream 20 from the high- 
performance level as it is provided in the server 26 to a 
lower performance level which is adapted to the capacity 

15 and performance of the transmission line 30 and receiver 
28, respectively. For example, the transmission link 30 may 
be a transmission link with a very low error bit rate. In 
this case, the gateway 32 would convert the data stream 20 
into a data stream having less or no redundancy informa- 

20 tion. 

In order to enable this conversion which is illustrated in 
Fig. 1 by an arrow 34, in an effective and very simple way, 
the encoding process 16 is performed such that the data 

25 packets 22 within one access unit 24 are arranged in accor- 
dance with a predetermined order among the data packet 
types A-F- For illustrating purposes only, it is assumed in 
Fig. 1 that the predetermined order among the data packet 
types A-F is equivalent to the alphabetical order. Thus, as 

30 can be seen from Fig. 1, in each access unit 24 the con- 
secutive data packets 22 are arranged in alphabetical order 
with respect to their type. It is emphasized, that there is 
possibly more than one data packet of a specific data pack- 
age type in an access unit, although such circumstances are 

35 not depicted in Fig. 1, and that the order among such data 
packets of the same data packet type may or may not be pre- 
scribed by a predetermined ordering rule. Moreover, even 
though it is assumed that the present data stream 20 is of 
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highest performance level, there may exist access units 24 
in the data stream 20 which do not contain data packets of 
all the data packet types A-F, although such an access unit 
is not shown in Fig. 1. Moreover, it is noted that for the 
5 purpose of enabling adaptation and converting the data 
stream in a simple way, a more relaxed predetermined order 
among the data packet types A-F could be sufficient as will 
be described in the following with respect to Fig, 4 to 8 ♦ 
To be more precise, it is not necessary that the predeter- 

10 mined order is such strict that each data packet type is 
fixed to a position in front of all other data packer 
types, between two other data packet types or after all 
other data packet types. Rather, it could be sufficient if 
the predetermined order contains just one or more ordering 

15 rules such as "data packets of the removable data packet 
type X (X=A,-. f F) have to precede or succeed data packets of 
data packet type Y (Y^X and Y=A,„.,F)". In particular, it 
would be possible that instead of the strict alphabetic or- 
der, the predetermined order could allow the mixing-up of 

20 data packets of the data packet types C and D, for example. 

Due to the prescribed order among the data packet types A— 
F, the gateway 32 can convert the data stream 20 having a 
high -performance level to a data stream 36 having a lower 

25 performance level merely by removing some of the removable 
data packet types which, for example, contain redundant 
picture information or supplemental enhancement information 
which is not necessary for retrieval of the pictures 12 or 
audio signal 14 from the data stream 20. Moreover, the re- 

30 moved data packets of the removable data packet types could 
as well concern essential information- In this case, the 
gateway 32 would, for example, transmit this information of 
these data packets via a different transmission link to the 
receiver 28 as will be described in more detail below. 

35 

As can be seen in Fig* 1, it is assumed that in the conver- 
sion process 34 performed by gateway 32 all data packets 22 
of the data packet types A, B, and E have been removed from 
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the data stream 20 in order to obtain a shortened data 
stream 36. As can easily be understood, the borders between 
successive access units can still easily be detected in the 
data stream 36 by means of the predetermined order: Each 
5 time a data packet of a specific data packet type X pre- 
cedes a data of a data packet type Y that, in accordance 
with the predetermined order, precedes the data packet type 
X of the preceding data packet, between these data packets 
two successive access units 38 abut or a border between two 
10 successive access 38 units exists. In the exemplary case of 
Fig 1 this condition applies all times when the data 
packet of the data packet type F precedes a data packet of 
the data packet type C. Thus, the extension of each access 
unit 38 in the converted data stream 36 can still easily be 
15 obtained at decoder's side by use of the knowledge of the 
predetermined order among data packet types even though, at 
decoder's side, it is unknown which if the removable data 
packet types have been removed. Thus, each access unit of 
the access units in the converted data stream 36 which are 
20 indicated by braces 38 corresponds with one of the access 
units 24 in the data stream 20. In particular, the access 
units 24 and access units 38 are equal in number and order. 
Moreover, since the borders between successive access unxts 
are detectable even in the modified data stream 36 and are 
25 arranged at the same places, removal of data packets merely 
results reducing the size of access units 38 of data stream 
36 relative to the access units 24 in data stream 22. 

After transmission of the data stream 36 via a transmission 
30 link 30 to receiver 28, the converter data stream 36 xs de- 
coded at the receiver 28 in a decoding process 40. The re- 
ceiver 28 may decode the data stream 36 solely by use of 
the data stream itself if the data packets removed at the 
converting process 34 merely contained information not be- 
35 ing necessary for retrieval of the original 
signal 10. In the other case, the receiver 28 
converted data stream 36 based on information contained *n 
the data packets having been removed in the converting 
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process 34 and having been transmitted to receiver 28 via 
an extra transmission link, for example. 

The result of the decoding process 40 is a decoded informa— 
5 tion signal 42 in a quality as it would be obtained by di- 
rectly decoding the data stream 20, Alternatively , the 
quality of the decoded information signal 42 is somewhat 
reduced in comparison to the quality of a decoded informa- 
tion signal as obtained directly by decoding data stream 
10 20, 

To summarize, by defining the predetermined order among the 
data packet types, it is possible not only to maintain the 
correspondence between access units in the original data 

15 stream 20 and the access units 38 in the converted data 
stream 36 but also to enable the receiver 28 to associate 
each data packet with the access unit it originally be- 
longed to in the original data stream 20. The latter guar- 
anties that a receiver 28 buffering the incoming data pack- 

20 ets and emptying the buffer in units of access units is not 
liable to a buffer overflow as will be described in more 
detail below. 

In the following, a specific embodiment of the present in- 
25 vention will be described in view of a video signal as the 
information signal. In the following, reference will also 
be made to Fig. 2, in order to illustrate the following 
specific embodiment in view of an exemplary application en- 
vironment • 

30 

Fig. 3 shows an embodiment of an encoder 18 for encoding a 
video signal into a data stream. The encoder 18 comprises a 
precoder 50, an encoder 52 and an arranging unit 54 all be- 
ing connected in series between an input 56 and an output 
35 58 of the encoder 18. At the input 56 the encoder 18 re- 
ceives the video signal, wherein in Fig- 3 illustratively 
one picture 12 of the video signal is shown. All pictures 
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12 of the video signal are composed of a plurality of pix- 
els or picture samples arranged in rows and columns. 

The video signal or pictures 12 are fed via input 56 to the 
5 video precoder 50. The video precoder 50 treats the pic- 
tures 12 in units of so-called macroblocks 12a, i.e. a 
block of, for example, 4x4 pixel samples. On each macrob- 
lock 12a precoder 50 performs a transformation into spec- 
tral transformation coefficients followed by a quantization 

10 into transform coefficient levels* Moreover, intra-frame 
prediction or motion-compensation is used in order not to 
perform the afore-mentioned steps directly on the pixel 
data but on the differences of same to predicted pixel val- 
ues, thereby achieving small values which may more easily 

15 be compressed - 

The macroblocks into which the picture 12 is partitioned 
are grouped into several slices. For each slice a number of 
syntax elements are generated which form a coded version of 
20 the macroblocks of the slice. For illustration purposes, in 
Fig- 3 the picture 12 is shown as being partitioned into 
three slice groups or slices 12b. 

The syntax elements output by precoder 50 are dividable 
25 into several categories or types- The encoder 52 collects 
the syntax elements of the same category and belonging to 
the same slice of the same picture 12 of a sequence of pic- 
tures and groups them to obtain data packets- In particu- 
lar, in order to obtain a data packet, the encoder 52 forms 
30 a compressed representation of the syntax elements belong- 
ing to a specific data packet to obtain payload data. To 
this payload data encoder 52 attaches a type number indi- 
cating the data packet type to obtain a data packet. The 
precoder 50 and the encoder 52 of the encoder 18 form a so- 
35 called video coding layer (VCD for efficiently represent- 
ing the video content- 
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The data packets output by encoder 52 are arranged into a 
data stream by arranging unit 55 as will be described in 
more detail with respect to Fig. 4. The arranging unit 55 
represents the network abstraction layer (NAL) of encoder 
5 18 for formatting the VCL representation of the video and 
providing header information in a manner appropriate for 
conveyance by a variety of transport layers of a storage 
media. 

10 The structure of the data stream output by encoder 18 of 
Fig. 3 is described in more detail below with respect to 
Fig* 4. In Fig. 4, the data stream output at output 58 is 
shown at 70- The data stream 70 is organized in consecutive 
blocks 72 of coded video sequences of consecutive pictures 

15 of a video. The coded video sequence blocks 72 internally 
consist of a series of access units 74 that are sequential 
in the data stream 70. Each coded video sequence 72 can be 
decoded independently of any other coded video sequence 72 
from the data stream 70 , given the necessary parameter set 

20 information, which may be conveyed "in-band" or u out~of- 
band" as will be described in more detail below. Each coded 
video sequence 72 uses only one sequence parameter set. 

At the beginning of a coded video sequence 72 is an access 
25 unit 74 of a special type, called instantaneous decoding 
refresh (IDR) access unit. An IDR access unit contains an 
intra picture, i.e. a coded picture that can be decoded 
without decoding any previous pictures in the data stream 
70- The presence of an IDR access unit in the data stream 
30 70 indicates that no subsequent picture in the stream 70 
will require reference to pictures prior to the intra pic- 
ture it contains in order to be decoded. The data stream 70 
may contain one or more coded video sequences 72. 

35 An access unit 74 is a set of NAL units 7 6 in a specified 
form, the specified form being explained in more detail be- 
low. The decoding of each access unit 74 results in one de- 
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coded picture. In the following, the data stream 70 is also 
sometimes called NAL unit stream 70* 

The NAL units 76 correspond with the data packets mentioned 
5 above with respect to Fig. 3- In other words, the coded 
video data is organized by encoder 52 in NAL units 76. Each 
NAL unit 7 6 is effectively a packet that contains an inte- 
ger number of bytes- The first byte of each NAL unit is a 
header byte 78 that contains an indication of the type of 
10 data in the NAL unit, and the remaining bytes contain pay- 
load 80 data of the type indicated by header 78. 

The payload data 80 in the NAL units 76 may be interleaved, 
as necessary, with emulation prevention bytes. Emulation 
15 prevention bytes are bytes inserted with a specific value 
to prevent a particular pattern of data called a start co- 
prefix from being accidentally generated inside the pay- 
load. 

20 The NAL unit structure definition specifies a generic for- 
mat for use in both packet-oriented and bit stream-oriented 
transport systems, at a series of NAL units generated by an 
encoder as referred to as the NAL unit stream 70. 

25 For example, some systems require delivery of the entire or 
partial NAL unit stream 7 0 as an ordered stream of bytes or 
bits within which the locations of NAL unit boundaries 82 
need to be identifiable from patterns with the coded data 
itself. 

30 

For use in such systems, encoder 18 creates data stream 70 
in a byte stream format. In the byte stream format each NAL 
unit 7 6 is prefixed by a specific pattern of, for example, 
three bytes, called a start code prefix- This start code 
35 prefix is not shown in Fig. 4 since it is optionally. If 
present, the start code prefix within an NAL unit precedes 
the header byte 78. The boundaries of the NAL 7 6 can then 
be identified by searching the coded data for the unique 
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start code prefix pattern. Moreover,- the NAL data stream 
output by encoder 18 of Fig. 3 may be interleaved by emula- 
tion prevention bytes within the payload data blocks 80 of 
the NAL units 76 in order to guarantee that start code pre- 
5 fixes are unique identifiers of a start of a new NAL unit 
76. A small amount of additional data (one byte per video 
picture) may also be added to allow decoders that operate 
in systems that provide streams of bits without alignment 
to byte boundaries to recover the necessary alignment from 

10 the data in the stream- 
Additional data could also be inserted into the byte stream 
format that allows expansion of the amount of data to be 
sent and can aid in achieving more rapid byte alignment re- 

15 covery, if desired . 

In other systems , like internet protocol or RTP systems, 
the coded data or data stream 70 is carried in packets that 
are framed by the system transport protocol r an identifica- 

20 tion of the boundaries of NAL units within the packets can 
be established without use of start code prefix patterns . 
In such systems, the inclusion of start code prefixes in 
the data of NAL units 76 would be a waste of data-carrying 
capacity , so instead the NAL units 76 can be carried in 

25 data packets without start code prefixes - 

NAL units are classified into VCL and non-VCL NAL units . 
The VCL NAL units contain the data that represents the val- 
ues of the samples in the video pictures 12 and are, there- 

30 fore f necessary for decoding, and the non-VCL NAL units 
contain any associated additional information such as pa- 
rameter sets, i.e. important header data that can apply to 
a large number of VCL NAL units , and supplemental enhance- 
ment information, such as timing information and other sup- 

35 plemental data that may enhance usability of the decoded 
video signal (42 in Fig. 1) but are not necessary for de- 
coding the values of the samples in the video pictures 12. 
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A parameter set is supposed to contain information that is 
expected to rarely change and offers the decoding of a 
large number of VCL NAL units- There are two types of pa- 
rameter sets: 

5 

sequence parameter sets, which apply to a series of 
consecutive coded video pictures called a coded video 
sequence, and 

picture parameter sets, which apply to the decoding of 
10 one or more individual pictures 12 within a coded 

video sequence 72. 

The sequence and picture parameter set mechanism which is 
described in more detail below decouples the transmission 

15 of infrequently changing information from the transmission 
of coded representations of the values of the samples in 
the video pictures 12. Each VCL NAL unit 7 6 contains in its 
payload data portion 80 an identifier that refers to the 
content of the relevant picture parameter set, and each 

20 picture parameter set non-VCL NAL unit contains in its pay- 
load data portion 80 an identifier that refers to the con- 
tent of the relevant sequence parameter set. In this man- 
ner, a small amount of data, i.e* the identifier, can be 
used to refer to a larger amount of information, i.e. the 

25 parameter set, without repeating that information within 
each VCL NAL unit. 

Sequence and picture parameter sets can be sent well ahead 
of the VCL NAL units that they apply to, and can be re- 

30 peated to provide robustness against data loss, as will be 
described in more detail below. In some applications, pa- 
rameter sets may be sent within the channel that carries 
the VCL NAL units termed "in-band" transmission- In other 
applications, it can be advantageous to convey the parame- 

35 ter sets ^out-of-band" using a more reliable transport 
mechanism or transmission link than the video channel for 
transmitting the NAL data stream 70 itself as will be de- 
scribed in the following with respect to Fig- 6 and 7. 
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Now, before explaining in detail the predetermined order 
among the NAL unit types in accordance with the present em- 
bodiment, in the following the different NAL unit types are 
5 listed in Table 1 below along with their associated NAL 
unit type number for reasons of completeness. 



Table - NAL units 



Nal unit type 


Content o£ NAL unit and RBSP syntax structure 


c I 


0 


Unspecified 




1 


coded slice of a non-IDR picture 

slice layer- without ^partitioning NAL unit() 


2/3,4 


! 2 


Coded slice data partition A 

slice adtd partiLion a layer nivl unxu w 


2 


3 


Coded slice data partition B 

slice data partition b layer- NAL unit() 


3 


4 


Coded slice data partition C 

slice data partition c layer NAL unit() 


4 


5 


Coded slice of an IDR picture 

slice layer without partitioning NAL unit ( ) 


2,3 


6 


Supplemental enhancement information (SEI) 
sei NAL unit{) 


5 


7 


Sequence parameter set 

seq parameter set NAL unit() 


0 


a 


Picture parameter set 

pic parameter set NAL unit() 


1 


9 


Access unit delimiter 

access unit delimiter NAL unit { ) 


6 


10 


End of sequence 

end of seq NAL unit ( ) 


7 


li 


End of stream 

end of stream NAL unit ( ) 


8 


12 


Filler data 

filler data NAL unit() 


9 


13.. 23 


Reserved 




24. .31 


Unspecified 





10 

As can be seen from Table 1, NAL units 76 having a NAL unit 
type 1 as its header byte 78 belong to one of the non-lDR 
access units, i.e. one of the access units 74 which succeed 
the first access unit of each coded video sequence 72, 
15 which is the IDR access unit as mentioned before. Moreover, 
as indicated in Table 1, a NAL unit 7 6 of NAL unit type 1 
represent coded versions of a slice of a non-IDR picture, 
i.e. a picture other than the first picture of a coded 
video sequence 72. As is shown in the last column of Table 
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1, in NAL units 7 6 of NAL unit type 1 syntax elements of 
categories C — 2, 3 and 4 are combined. 

At the side of the encoder, it may have been decided not to 
5 combine the syntax elements of category 2, 3 and 4 of one 
slice in one common NAL unit 76. In this case, partitioning 
is used in order to distribute the syntax elements of dif- 
ferent categories 2, 3 and 4 to NAL units of different NAL 
unit types, i.e. NAL unit type 2, 3 and 4 for categories C 

10 2, 3 and 4, respectively. To be more specific, partition A 
contains all syntax elements of category 2. Category 2 syn- 
tax elements include all syntax elements in the slice 
header and slice data syntax structures other than the syn- 
tax elements concerning single transform coefficients. Gen— 

15 erally spoken, partition A syntax elements as contained in 
NAL units of the NAL unit type 2 are more important than 
the syntax elements contained in NAL units 76 of NAL unit 
type 3 and 4. These latter NAL units contain syntax ele- 
ments of category 3 and 4, which include syntax elements 

20 concerning transform coefficients. 

As can be seen, slice data partitioning is not possible 
within the first picture of a video sequence so that coded 
versions of slices of an IDR picture jare conveyed by NAL 
25 units 7 6 of a NAL unit type 5. 

NAL units 76 of NAL unit type 6 contain in its payload data 
portion 80 supplemental enhancement information (SEI) with 
the afore-mentioned examples - 

30 

NAL units 7 6 of NAL unit type 7 contain in its payload data 
80 a sequence parameters set, while NAL units 7 6 of NAL 
unit type 8 contain in its payload data 80 a picture pa- 
rameter set, 

35 

NAL units 7 6 of NAL unit type 9 are called an access unit 
delimiter and indicate the beginning of an access unit. As 
it will turn out from the following description, access 
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unit delimiter are optional and not necessary for parsing 
of the NAL data stream 70. 

NAL units of NAL unit types 10 and 11 are NAL units indi- 
5 eating the end of a sequence and the end of the whole data 
stream, respectively. NAL units 76 of NAL unit type 12 con- 
tain in its payload portion 80 filler data as may be neces- 
sary for some networks, NAL unit types 13 to 23 and 24 to 
31 pertain reserved or unspecified NAL unit types for spe- 
10 cific applications. 

Now, after having described rather broadly the structure of 
the NAL unit stream 70 generated by the encoder 18 of Fig, 
3, the constrains on the order of the NAL units 7 6 in the 

15 bit stream 70 are described in more detail with reference 
to Table 1 and Fig. 4, Any order of NAL units 76 and the 
data or bit stream 70 obeying the below mentioned con- 
strains are, in accordance with the present embodiment of 
the present invention, in conformity with parsing rules 

20 used by a decoder of interest in order to retrieve the 
coded information, i.e. the video signal. Decoders using 
that parsing rules shall be capable of receiving NAL units 
7 6 in this parsing or decoding order and retrieving the 
syntax elements. 

25 

In the following, the positioning of sequence and picture 
parameter set NAL units, i.e. NAL units of NAL unit type 7 
and 8, is specified first. Then, the order of access units 
74 is specified. Then, the order of NAL unit 76 and coded 
30 pictures 12 and their association to access units 74 is 
specified. Finally, the order of VCL NAL units and associa- 
tion to coded pictures is described. 

As mentioned before, NAL units 7 6 are classified into VCL 
35 and non-VCL NAL units. The VCL NAL units contain the data 
that represent the values of the samples and the video pic- 
tures, and the non-VCL NAL units contain any associated ad- 
ditional information such as parameter sets and supplemen- 
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tal enhancement information, such as timing information and 
other supplemental data that may enhance usability of the 
decoded video signal but are not necessary for decoding the 
values of the samples and the video pictures. With refer- 
5 ence to Table 1, which specifies the type of RBSP data 
structure contained in the NAL unit 7 6, VCL NAL units are 
specified as those NAL units having NAL_unit_type=l, 2, 3, 
4, 5 or 12 , all remaining NAL units are called non-VCL NAL 
units - 

10 

The NAL units having NAL unit type other than 1-5 and NAL 
units having NAL unit type 1-5 and, concurrently, having a 
syntax element indicating that they are concerning redun- 
dant pictures are removable NAL units. 

15 

In the following, the payload data 80 is sometimes called 
Raw Bata Sequence Payload or RBSP. The RBSP 80 is a syntax 
structure containing an integer number of bytes that is en- 
capsulated in a NAL unit 76. An RBSP is either empty or has 
20 the form of a string of data bytes containing syntax ele- 
ments followed by an RBSP stop bit and followed by a zero 
and more subsequent bytes equal to zero. 

In this way, a NAL unit 7 6 is a syntax structure containing 
25 an indication of the type of data to follow, i.e. the 
header byte 78, and bytes 80 containing the data in the 
form of an RBSP interspersed as necessary with emulation 
prevention bytes as already noted above. 

30 On the other hand, an access unit 74 represents any primary 
coded picture, zero or more corresponding redundant coded 
pictures, and zero or more non-VCL NAL units. The associa- 
tion of VCL NAL units to primary or redundant coded pic- 
tures or access units is described below. 

35 

In order to allow the removal of removable NAL units 7 6 
from data stream 70 with remaining the decoding or parsing 
order, the format of the access unit 74 is like shown in 
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Fig. 5. The NAL units 7 6 that can be removed are all types 
except VCL NAL units of a primary coded picture, i.e. all 
NAL unit types except NAL unit types 1 to 5, 

5 As shown in Fig. 5, each access unit contains in any case a 
set of VCL NAL units that together compose a primary coded 
picture 100. An access unit may be prefixed with an access 
unit delimiter 102, i.e. a NAL unit having NAL_unit_type 9 
to 8 to aid in locating the start of the access unit 74. 
10 Some supplemental enhancement information SEI in form of 
SEI NAL units of NAL unit type 6 containing data such as 
picture timing information may also precede the primary 
coded picture 100 , this SEI block being indicated by refer- 
ence number 104 . 

15 

The primary coded picture consists of a set of VCL NAL 
units 76 consisting of slices or sliced data partitions 
that represent samples of the video picture. 

20 Following the primary coded picture 100 may be some addi- 
tional VCL NAL units that contain redundant representations 
of areas of the same video picture. These are referred to 
as redundant coded pictures 106, and are available for use 
by a decoder in recovering from loss or correction of the 

25 data in the primary coded pictures 100. Decoders are not 
required to decode redundant coded pictures if they are 
present. Finally, if the coded picture the access unit 74 
is associated with is the last picture of a coded video se- 
quence 72, wherein a sequence of pictures is independently 

30 decodable and uses only one sequence parameter set, an end 
of sequence NAL unit 108 may be present to indicate the end 
of the sequence 72. And if the coded picture is the last 
coded picture in the entire NAL unit stream 70, an end of 
stream NAL unit 110 may be present to indicate that the 

35 stream 70 is ending. 

Fig. 5 shows the structure of access units not containing 
any NAL units with NAL_unit_type = 0,7,8 or in the range of 
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12 to 31, inclusive. The reason for having limited the il- 
lustration of access units to cases where NAL units of the 
aforementioned have been removed is, that, as already noted 
above, sequence and picture parameter sets in NAL units of 
5 NAL unit type 7 and 8 may, in some applications, be con- 
veyed "out-of-band" using a reliable transport mechanism 
or, in an redundant manner, in-band. Thus, an encoder 18 
may output the sequence and picture parameter sets in-band 
i.e. in the data stream 70, or out-of-band i.e. using an 
10 extra output terminal. 

Anyway, the encoder 18 or any means in between the encoder 
18 and the decoder 28 has to guarantee that the following 
constrains on the order of sequence and parameter set RBSPs 
15 and their activation are obeyed, 

A picture parameter set RBSP includes parameters that can 
be referred to by decoded slice A NAL units or coded slice 
data partition NAL units of one or more coded pictures ♦ 

20 

I) When a picture parameter set RBSP having a particular 
value of PIC_parameter_set_id, i.e. the header byte 78, 
is referred to by a coded slice NAL unit or coded slice 
data partition A NAL unit using that value of 

25 Pic_parameter_set_id, it is activated. This picture pa- 

rameter set RBSP is called the active picture parameter 
set RBSP until it is deactivated by the activation of 
another picture parameter set RBSP. Picture parameter 
set RBSP, with that particular value of 

30 PlC_parameter_set_id, shall be available to the decoding 

process at decoder 28 prior to its activation. Thus, the 
encoder 18 has to take this into account when transmit- 
ting sequence and picture parameter set in-band or out- 
of-band. 



35 



Any picture parameter set NAL unit containing the value of 
pic_parameter_set_id for the active picture parameter set 
RBSP shall have the same content as that of the active pic- 



26-02-04 10:25 



T-203 P. 030/063 F-217 



27 



ture parameter set RBSP unless it follows the last VCL NAL 
unit of a coded picture and precedes the first VCL NAL unit 
of another coded picture. 

5 A sequence parameter set RBSP includes parameters that can 
be referred to by one or more picture parameter set RBSPs 
or one or more SEI NAL units containing a buffering period 
SEI message, 

10 II) When a sequence parameter set RBSP (with a particular 
value of seq_parameter_set_id) is referred to by acti- 
vation of a picture parameter set RBSP (using that 
value of seq_parameter_set_id) or is referred to by an 
SEI NAL unit containing a buffering period SEI message 

15 (using that value of seq_parameter__set_id) , it is ac- 

tivated. This sequence parameter set RBSP is called 
the active sequence parameter set RBSP until it is de- 
activated by the activation of another sequence pa- 
rameter et RBSP* A sequence parameter set RBSP, with 

20 that particular value of seq — parameter _set_id, shall 

be available to the decoding process prior to its ac- 
tivation. An activated sequence parameter set RBSP 
shall remain active for the entire coded video se- 



25 



quence 



Any sequence parameter set NAL unit containing the value of 
seq_parameter_set_id for the active sequence parameter set 
RBSP shall have the same content as that of the active se- 
quence parameter set RBSP unless it follows the last access 
30 unit of a coded video sequence and precedes the first VCL 
NAL unit and the first SEI NAL unit containing a buffering 
period SEI message (when present) of another coded video 
sequence. 



35 



In the following, the order of NAL units and coded pictures 
and their association to access units is described in more 
detail as before with reference to Fig. 5. 
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An access unit 74 consists of one primary coded picture 
100, zero or more corresponding redundant coded pictures 
106, and zero or more one-VCL NAL units 102 , 104 , 108 and 
110 r as already mentioned above. 

5 

The association of VCL NAL units to primary or redundant 
coded pictures is described below. 

The first of any of the following NAL units 76 after the 
10 last VCL NAL unit of a primary coded picture 100 specifies 
the start of a new access unit. 

a) Access unit delimiter NAL unit (NAL unit type 9) (when 
present) 

15 b) sequence parameter set NAL unit (NAL unit type 7) 
(when present) 

c) picture parameter set NAL unit (NAL unit type 8) (when 
present) 

d) SEI NAL unit (NAL unit type 6) (when present) 

20 e) NAL units with nal_unit_type in the range of 13 to 18, 
inclusive 

f) first VCL NAL unit of a primary coded picture (NAL 
unit type 1-5) (always present) 

25 The constraints for the detection of the first VCL NAL unit 
of a primary coded picture are specified further below and 
can be used given the above claimed restriction to distin- 
guish access units even if NAL units that are allowed to be 
removed are removed • The NAL units that can be removed are 

30 all types except VCL NAL unit of a primary coded picture. 

The following constraints shall be obeyed by the order of 
the coded pictures and non-VCL NAL units within an access 
unit. 



35 



g) When an access unit delimiter NAL unit (NAL unit type 
9) is present, it shall be the first NAL unit. There 
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shall be at: most one access unit delimiter NAL unit in 
any access unit. 

When any SEI NAL units (NAL unit type 6) are present, 
they shall precede the primary coded picture. 
When an SEI NAL unit containing a buffering period SEI 
message shall be the first SEI message payload of the 
first SEI NAL unit in the access unit, wherein a buff- 
ering period SEI NAL unit is for controlling the buff- 
ering management at decoder's side. 

The primary coded picture (consisting of NAL units of 
NAL unit types 1-5 and having redundant picture count 
value being equal to zero) shall precede the corre- 
sponding redundant coded pictures - 

When redundant coded pictures (consisting of NAL units 
of NAL unit types 1-5 and having redundant picture 
count value being not equal to zero) are present, they 
shall be ordered in ascending order of the value of 
redundant picture count value redundant_pic_cnt . 
When an end of sequence NAL unit (NAL unit type 10) is 
present, it shall follow the primary coded picture and 
all redundant coded pictures (if any) . 

When an end of stream NAL (NAL unit type 11) unit is 
present, it shall be the last NAL unit. 

NAL units having nalunittype equal to 0, 12, or in 
the range of 19 to 31, inclusive, shall not precede 
the first VCL NAL unit of the primary coded picture. 
Sequence parameter set NAL units or picture parameter 
set NAL units may be present in an access unit, but 
cannot follow the last VCL NAL unit of the primary 
coded picture within the access unit, as this condi- 
tion would specify the start of a new access unit (see 
constraint b) ) - 

When a NAL unit having nal_unit_type equal to 7 or 8 
is present in an access unit r it may not be referred 
to in the coded pictures of the access unit in which 
it is present, and may be referred to in coded pic- 
tures of subsequent access units. 
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In the following, the order of VCL NAL units and the asso- 
ciation to coded pictures is described in more detail be- 
low. 

5 q) Each VCL NAL unit is part of a coded picture. 

r) The order of the VCL NAL units within a coded IDR pic- 
ture is constrained as follows. 



10 



15 



20 s) 



25 



30 



35 



If arbitrary slice order is allowed as specified 
by a certain syntax element, coded slice of an 
IDR picture NAL units may have any order relative 
to each other- 

Otherwise (arbitrary slice order is not allowed) , 
the order of coded slice of an IDR picture NAL 
units shall be in the order of increasing raacrob- 
lock address for the first macroblock of each 
coded slice of an IDR picture NAL unit. 

The order of the VCL NAL units within a coded non-IDR 
picture is constrained as follows. 

If arbitrary slice order is allowed as specified 
by a specific syntax element, coded slice of a 
non-IDR picture NAL units or "coded slice data 
partition A NAL units may have any order relative 
to each other. A coded slice data partition A NAL 
unit with a particular value of slice_id shall 
precede any present coded slice data partition B 
NAL unit with the same value of slice_id. A coded 
slice data partition A NAL unit with a particular 
value of slice_id shall precede any present coded 
slice data partition C NAL unit with the same 
value of slice_id. When a coded slice data parti- 
tion B NAL unit with particular value of slice_id 
is present, it shall precede any present coded 
slice data partition C NAL unit with the same 
value of slice_id. 
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- Otherwise (arbitrary slice order is not allowed) , 
the order of coded slice of a non-IDR picture NAL 
units or coded slice data partition A NAL units 
shall be in the order of increasing macroblock 
address for the first macroblock of each coded 
slice of a non-IDR picture NAL unit or coded 
slice data partition A NAL unit. A coded slice 
data partition A NAL unit with a particular value 
of slice__id shall immediately precede any present 
coded slice data partition B NAL unit with the 
same value of slice_id. A coded slice data parti- 
tion A NAL unit with a particular value of 
slice_id shall immediately precede any present 
coded slice data partition C NAL unit with the 
same value of slice_id, when a coded slice data 
partition B NAL unit with the same value of 
slice_id is present; it shall immediately precede 
any present coded slice data partition C NAL unit 
with the same value of slice_id 

NAL units having nal_unit_type equal to 12 may be pre- 
sent in the access unit but shall not precede the 
first VCL NAL unit of the primary coded picture within 
the access unit. 

NAL units having nal_unit_type equal to 0 or in the 
range of 24 to 31 r inclusive, which are unspecified, 
may be present in the access unit but shall not pre- 
cede the first VCL NAL unit of the primary coded pic- 
ture within the access unit* 

NAL units having nal_unit_type in the range of 19 to 
3 f inclusive, which are reserved, shall not precede 
the first VCL NAL unit of the primary coded picture 
within the access unit. 
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The creation of the data stream 70 is further restricted by 
the following constraints in order to enable the detection 
of the first VCL NAL unit of a primary coded picture: 

5 w) Any coded slice NAL unit or coded slice data partition 
A NAL unit of the primary coded picture of the current 
access unit shall be different from any coded slice 
NAL unit or coded slice data partition A NAL unit of 
the primary coded picture of the previous access unit 
10 in one or more of the following ways. 

- frame num. differs in value, f rame_num is an iden- 
tifier in each VCL NAL unit indicating the pic- 
ture 12 of the video 10a it belongs to. A value 

15 of frame_num may be assigned to more than one 

picture or access unit, but the value of 
frame_num in the payload data of the VCL NAL unit 
of successive access units 74 may not be the 
same* In other words, frame_num is used as a 

20 unique identifier for each short-term reference 

frame- For example, when the current picture is 
an IDR picture, frame_num shall be equal to zero. 
field_pic_flag differs in value, f ieldjpicf lag 
as contained in the payload data 80 of VCL NAL 

25 units specifies, if equal to one, that the slice 

is associated to a coded field, i.e. a field of 
an interlaced frame, and , if equal to zero 
specifies that the picture which the VCL NAL unit 
having that f ield_pic_f lag is a coded frame, i.e. 

30 a coded interleaved or coded progressive frame - 

bottom_field_flag is present in both and differs 
in value- bottom__f ield_f lag as contained in the 
payload data 80 of a VCL NAL unit specifies, if 
equal to one, that the slice is associated to a 
35 coded bottom field, whereas bottom_f ield_f lag 

equal to zero specifies that the picture is a 
coded top field. To be more specific, a coded 
video sequence consists of a sequence of coded 
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pictures , wherein a coded picture may represent 
either an entire frame or a single field. Gener- 
ally, a frame of video can be considered to con- 
tain two interleaved fields , a top and a bottom 
field. The top field contains even-numbered rows, 
whereas the bottom field contains the odd- 
numbered rows, for example. Frames in which the 
two fields of a frame are kept at a different 
time instance, are referred to as interlaced 
frames- Otherwise, a frame is referred to as a 
progressive frame - 

nal ref_idc differs in value with one of the 
nal_ref_idc values being equal to 0. nal__ref_idc 
is an identifier that may be contained in a pay- 
load data 80 of a NAL unit. nal_ref_idc not equal 
to zero specifies that the content of the NAL 
unit contains a sequence parameter set or a pic- 
ture parameter set or a slice of a reference pic- 
ture or a slice data partition of a reference 
picture. Therefore, nal_ref_idc equal to zero for 
a NAL unit containing a slice or slice data par- 
tition indicates that a slice or slice data par- 
tition is part of a non-reference picture- Any 
nal_jcef_idc shall not be equal to zero for a se- 
quence parameter set or a picture parameter set 
in a NAL unit. If nal_ref_idc is equal to zero 
for one slice or slice data partition in a NAL 
unit of a particular picture, it shall be equal 
to zero for all slice and slice data partition 
NAL units of the picture- nal__ref_idc is, there- 
fore, not equal to zero for IDR NAL units, i.e. 
NAL units with a nal_unit_type equal to 5. A 
nal_ref_idc is equal to zero for all NAL units 
having an nal_unit_type equal to 6, 9, 10, 11 or 
12. Picture_order_cnt type is an syntax element 
contained in payload data 80 in order to specify 
the method to code the syntax element pic- 
ture order count. The value of pic_order _cnt_type 
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shall be in the range of 0 to 2, inclusive. 
pic_order_cnt_type shall not be equal to 2 in a 
sequence that contains two or more consecutive 
non-reference frames, complementary non-reference 
5 field pairs or non-paired non-reference fields in 

decoding order. pic_order_cnt_lsb specifies , when 
contained in a payload data 80 of a VCL NAL unit, 
the picture order count coded for the field of a 
coded frame or for a coded field. An IDR picture 

10 shall, for example, have pic_order_cnt_lsb equal 

to zero. Data_pic_order_cnt_bottoms is a syntax 
element that specifies, when contained in a pay- 
load data 80 of a VCL NAL unit, the picture order 
count difference from the expected picture order 

15 count for the top field in a coded frame of a 

coded field. 

f rame_num is the same for both and 
pic order__cnt_type is equal to 1 for both and ei- 
ther delta_pic_order_cnt [0] differs in value, or 

20 delta_j>ic_order_cnt [1] differs in value. 

pic_order_cnt [0] specifies the picture order 
count difference from the expected picture order 
count for the top field in a coded frame or for a 
coded field, delta _jpic_order_cnt [1] specifies the 

25 picture order count difference from the expected 

picture order count for the bottom field and the 
coded frame. 

nal_unit_type is equal to 5 for both and 
idr_pic_id differs in value. idr_pic_id is a syn- 
30 tax element contained in payload data 80 of IDR 

picture in a VCL NAL unit and indicates an iden- 
tifier for different IDR pictures of different 
coded video sequences 72. 

35 After having described an embodiment for an encoder 18 and 
its constraints for creation of a data stream 70, in the 
following there is described a possible functionality of a 
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gateway 32 suitable for parsing the data stream 70 of en- 
coder of Fig. 3 to a receiver 28. 

The gateway 32 receives the data stream 70 NAL unit-wise at 
5 step 120. At step 122 , the gateway 32 investigates the type 
number, i.e. nal_unit_type, of the current NAL unit 7 6 just 
received in order to determine at step 124 as to whether 
this NAL unit is of a NAL unit type to be removed- For ex- 
ample, the NAL data stream 7 0 is of high performance and 

10 has several redundant coded pictures 106. In this case, it 
could be, that gateway 32 decides to lower the redundancy 
level of the data stream 70 and removes all NAL units 76 
from data stream 70 having NAL unit types 1 to 5 and con- 
currently having a syntax element in the payload data 

15 called redundant_jpic_cnt being different to 0, wherein re- 
dundant^! c_cnt, in accordance with the present embodiment, 
equal to 0 indicates slice and slice data partitions be- 
longing to the primary coded picture of an access unit. The 
reduction in redundancy is advantageous if the transmission 

20 link 30 between gateway 32 and receiver 28 has a low bit 
error rate. 

Alternatively, gateway 32 decides to transmit sequence and 
picture parameter set NAL units via an extra transmission 
25 link (not shown in Fig. 2) to receiver 28. In this case, 
gateway 32 removes all NAL units of NAL unit types 7 and 8. 
Of course, it is possible that gateway 32 removes any com- 
bination of NAL unit types being removable. 

30 If the NAL unit 7 6 received at step 120 is to be removed, 
gateway 32 performs the removal of the current NAL unit 
from the data stream 70 and discards this NAL unit at step 
126. Otherwise, gateway 32 determines at step 12 8 as to 
whether the NAL unit received at step 120 has to be trans- 

35 mitted to the receiver 28 safely or has to be left un- 
changed. For example, if the NAL unit just received is a 
parameter set NAL unit it has to be transferred to the re- 
ceiver 28. In this case, there are two possibilities for 
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gateway 32. In the first case, gateway 32 decides to trans- 
mit the parameter set NAL unit via an extra transmission 
link. In this case, gateway 32 removes, in step 130, the 
NAL unit from the data stream 70 and, then, transmits, in 
5 step 132, the NAL unit via the extra transmission link. In 
particular, gateway 32 can perform the transmission of step 
132 several times. Gateway 32 just has to comply with the 
constraints on the order of sequence and picture parameter 
set RBSPs and their activation at decoder side as mentioned 
10 above (see I and II). 

Alternatively, gateway 32 decides to transmit NAL units 
containing the parameter sets in-band. In this case, gate- 
way 32 inserts, at step 134, the current NAL unit at an- 

15 other position of the data stream 70 to be more precise, at 
a preceding position of the NAL data stream 70. Of course, 
step 134 may be performed several times. Gateway 32 thus 
has to guarantee that the constraints on the order of se- 
quence and picture parameters at RBSPs and their activation 

20 at receiver 28 are obeyed (see constraints o and p) . 

After any of steps 126, 128, 132 and 134, gateway 32 
checks, at step 136, as to whether there are NAL units left 
in the data stream 70. If this is the_ case, the next NAL 
25 unit is received at step 120. Otherwise; the process of 
Fig. 6 and gateway 32 awaits the reception of the next NAL 
data stream 70. 

In order to illustrate the decoupling of the transmission 
30 of infrequently changing information from the transmission 
of coded representations of the values of the samples in 
the video pictures the sequence and picture parameter set 
mechanism is illustrated in Fig. 7. Fig. 7 shows the en- 
coder 18 and the receiving decoder 28. The data stream 70 
35 is represented by an arrow. The data stream 70 passed from 
encoder 18 to decoder 28 comprises a NAL unit with VCL data 
that is encoded by means of a parameter set having a 
pic_parameter_set_id of 3 as an address or index in the 
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slice header. As can be seen, the encoder 18 has generated 
several picture parameter sets, wherein in Fig. 7, the pic- 
ture parameter set having pic_parameter_set_id 1, 2 and 3, 
respectively, are shown representatively by small boxes 
5 140. The transmission of the parameter set NAL unit is per- 
formed via an extra transmission link 142 which is illus- 
trated by an double-headed arrow indicated "reliable pa- 
rameter set exchange". In particular, the content of the 
picture parameter set having pic_parameter_set_id of 3 is 

10 shown at 144 in more detail for illustration purposes. The 
picture parameter set having pic _parameter_set_id 3 con- 
tains information such as the video format used, i.e. PIL, 
and the entropy coding scheme used, such as one of a con- 
text adaptive binary arithmetic coding or a context adap- 

15 tive variable length (Huffman) coding. So, the NAL unit 
with VCL data having pic_parameter_set_id as an index to 
the parameter set NAL unit 144 does not have to contain all 
the content of the parameter set NAL unit 144. Therefore, 
the amount of data contained in the stream 70 can be re- 

20 duced. As mentioned above, the decoder 28 buffers the in- 
coming parameter sets and indexes same by the 
pic _parameter_set_id in the current NAL units by use of the 
above explained activation mechanism (see I and II) . 

25 With respect to Pig. 8, in the following an embodiment for 
the functionality of receiver or decoder 28 is described. 
At step 160 decoder 28 receives a NAL unit 76 of a NAL unit 
data stream 70 which may have been modified by the gateway 
32 by the process described with respect to Fig. 6 relative 

30 to the original version of the data stream as created by 
encoder 18. At step 162, the decoder 28 buffers the NAL 
unit 76 in a buffer having a predetermined buffer space ex- 
ceeding a predetermined standardized minimum buffer size 
known to the encoder. Next, at step 164, the decoder 28 de- 

35 tects the beginning of a new access unit. In other words, 
the decoder 28 checks as to whether the NAL unit just re- 
ceived at step 160 is the first of a new access unit. 



26-02-04 10:27 



T-203 P. 041/063 F-217 



38 



The detection in step 164 is performed by use of the afore- 
mentioned constraints on the order of NAL units and coded 
pictures and the association to access units {see a-f ) . In 
particular, the decoder 28 detects the beginning of a new 
5 access unit if the NAL unit received at step 160 is the 
first of any of the following NAL units after the last VCL 
NAL unit of a primary coded picture of the current access 
unit: 

Access unit delimiter NAL unit (when present) 
10 _ sequence parameter set NAL unit (when present) 
picture parameter set NAL unit (when present) 
SEI NAL unit (when present) 

NAL units with nal_unit_type in the range of 13 to 18, 
inclusive 

15 - first VCL NAL unit of a primary coded picture (always 
present) 

It is noted that the decoder 28 can detect the presence of 
a last VCL NAL unit of a primary coded picture 100 by means 
20 of the assumption that the pay load data of all the VCL NAL 
units of the primary coded picture 100 have to yield a com- 
plete pre-coded version of one picture as well as by means 
of the constraints mentioned above at (w) . 

25 When a new access unit has been detected (step 166), the 
decoder 28 deallocates or flushes buffer space at step 168 
by removing an odd access unit stored in the buffer. There- 
upon, the decoder 28 makes available the picture derived 
from the current access unit, i.e. the access unit which 

30 precedes the new access unit, just detected in step 164. 

Otherwise, i.e. if no new access unit has been detected 
(step 166), or after step 170, the decoder 28 decodes the 
NAL unit received at step 160 in order to receive the syn- 
35 tax elements contained therein. 

The process then loops back to step 160. As may have become 
clear from the foregoing description, the decoder 28 is not 
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liable to a buffer overflow as long as (1) the encoder 18 
has created an NAL unit data stream 70 with access unit 
sizes that comply with the maximum buffer size and (2) 
gateway 32 lets the data stream 70 unchanged, merely re- 
5 moves and discards removable and negligible NAL units from 
the data stream 70, merely removes removable but essential 
NAL units from the data stream 70 with transmitting them 
via an extra transmission link or, alternatively, inserts 
NAL units merely in access units so that the resulting ac- 
10 cess unit size does not result in an buffer overflow at de- 
coder's side. Anyway, by the above-described constraints on 
the creation of the data stream, the decoder 28 is in any 
way capable of detecting the beginning of a new access unit 
in an unitary and exact way. Therefore, it is possible for 
15 the encoder 18 and the gateway 32 to forecast the buffer 
space consumption at decoder side and, therefore, to avoid 
buffer spacer overflow, provided the decoder has the mini- 
mum amount of buffer space. 

20 As may be clear from the above, the present invention is 
not restricted to multimedia, video or audio signals. More- 
over, it is noted with respect to Fig. 2, that other con- 
stellations in which the present invention could be used 
are also possible. For example, more than one gateway 32 

25 could be interposed between the data" stream presentation 
(encoder) and the decoder. With respect to Fig. 6 it is 
noted, that the gateway 32 does not have to influence all 
of the options shown in Fig. 6. For example, a gateway 
could be designed to implement merely the removal of NAL 

30 units from the data stream without implementing steps 128 
to 134. Alternatively, a gateway could implement all steps 
of Fig. 6 except step 134 or all steps except 130 and 132. 

With regard to decoder of Fig. 8, it is noted that the 
35 buffering management described there helps in standardizing 
the data stream format of the data stream 70 of that em- 
bodiment. Nevertheless, the buffer management may be real- 
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ized in a different way, for example with de-allocating 
buffer space in other units than access units. 

In other words, in accordance with the above embodiments 
5 each syntax element is placed into a logical packet called 
a NAL unit. Rather than forcing a specific bitstream inter- 
face to the system as in prior video standards, the NAL 
unit syntax structure allows greater customization of the 
method of carrying the video content in a manner appropri- 

10 ate for each specific network. In particular, the above em- 
bodiment defines how NAL units are to be ordered within in 
access units- The constraints formulated on the order of 
NAL units specify the decoding order that must by accepted 
by an standard-conform decoder allowing a novel degree of 

15 freedom. Moreover, the ordering of the NAL units and their 
arrangement specifies access units and makes the distinc- 
tion between various access units possible even if NAL 
units that are allowed to be removed from the bitstream are 
removed- 

20 

The above embodiments permit, by their new way of defining 
the decoding order, an increased degree of flexibility that 
is especially important in internet applications where each 
NAL unit is typically transported in one packet and shuf- 
25 fling is likely to occur. This permits simpler decoder im- 
plementations - 

The distinction between various access units even if units 
that are allowed to be removed from the bitstream are re- 
30 moved permits a flexible rate shaping and transcoding of 
data and makes the method robust the transmission errors. 
The automatic distinction method also increases coding ef- 
ficiency by making start codes or delimiter codes superflu- 
ous. 

Depending on an actual implementation, the inventive encod- 
ing/decoding/converting methods can be implemented in hard- 
ware or in software. Therefore, the present invention also 
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relates to a computer program, which can be stored on a 
computer-readable medium such as a CD, a disk or any other 
data carrier- The present invention is, therefore, also a 
computer program having a program code which, when executed 
5 on a computer, performs the inventive method of encoding, 
converting or decoding described in connection with the 
above figures. 

While this invention has been described in terms of several 
10 preferred embodiments, there are alterations, permutations, 
and equivalents which fall within the scope of this inven- 
tion- It should also be noted that there are many alterna- 
tive ways of implementing the methods and compositions of 
the present invention. It is therefore intended that the 
15 following appended claims be interpreted as including all 
such alterations, permutations, and equivalents as fall 
within the true spirit and scope of the present invention. 

Furthermore, it is noted that all steps indicated in the 
20 flow diagrams are implemented by respective means in the 
encoder, gateway or decoder, respectively, an that the im- 
plementations may comprise subroutines running on a CPU, 
circuit parts of an ASIC or the like- 

25 
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